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Abstract 

Light Gradient Boosting Machine (LightGBM) and Random Forest (RF) 
algorithms were used to predict the apparent diffusion coefficient of Se(IV) in 
compacted bentonite. Seven instances of Se(IV) were measured using through- 
diffusion method. LightGBM (R? = 0.98 and RMSE = 0.025) exhibited superior 
predictive accuracy with a training dataset consisting of 956 instances and eight input 
features from Japan Atomic Energy Agency (JAEA-DDB). Shapley Additive 
Explanation and Partial Dependence Plots analyses revealed valuable insights into the 
diffusion mechanism of adsorbed anion obtained by evaluating the relationships 


between the apparent diffusion coefficient and the dependency of each input feature. 


Keywords: Diffusion coefficient; Bentonite; Machine learning; Through-diffusion 


experiment. 


1. Introduction 


Bentonite is commonly regarded as engineering barriers in high-level radioactive 
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waste repositories to hinder the release of radionuclides. The transportation of 
radionuclides in compacted bentonite is governed by diffusion-based mass transport, 
following Fick’s law [1-3]. The retardation mechanism is controlled by the diffusion 
and sorption of radionuclides [4]. Apparent diffusion coefficient and effective diffusion 
coefficient are two crucial parameters for characterizing Fick diffusion [5]. The 
effective diffusion coefficient is calculated using Fick’s first law, which involves 
analyzing radionuclide diffusion at steady-state stage. Since it takes a long time to reach 
the steady-state, a through-diffusion method is often employed for weak and non- 
sorbing radionuclides, such as ?"Se(IV) [6], HTO [3,7—9], Eu"-EDTA~ [10]. In contrast, 
the apparent diffusion coefficient measured at the transient-state stage considers the 
accessible porosity of bentonite in the accumulation term of Fick’s second law for 
adsorbed radionuclides [4,5,11,12]. An In-diffusion method was commonly employed 
to measure the strongly sorbing radionuclides, such as "SU(VD [13], SINp(V) [14], 
and !34Cs* [15]. The relationship between the apparent diffusion coefficient (Da) and 


the effective diffusion coefficient (De) can be expressed as follows [3]: 


D D 
D, ===. (1) 


" etp, K, a 


Where €, pa, Ka, and a are porosity, compacted dry density, distribution coefficient, and 
rock capacity factor. 

Diffusion experiments are generally time-consuming and expensive for acguiring 
diffusion coefficients. The effective diffusion coefficient is often used in diffusion 
models, such as empirical eguations and numerical models, enabling guick and cost- 


effective predictions. These models can quantitatively describe the dependence of the 
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effective diffusion coefficient of radionuclide species on mineral characteristics 
(mineral composition, porosity, cation exchange capacity, external surface area), 
solution properties (pH, Eh, mixed ions), and experimental conditions (compacted dry 
density, temperature, ionic strength) [6,11,12,16,17]. However, the predictive accuracy 
of these models remains unsatisfactory, which might be due to the approximation and 
assumptions during the modeling process. For example, the effective diffusion 
coefficient predicted by Archie’s equation deviated from the experimental value by 
approximately 1-1.5 orders of magnitude mainly due to the approximation [18]. 
Numerical models, such as the integrated sorption and diffusion model [19], multi- 
porosity model [20,21], and pore-scale model [22,23], rarely provide a prediction 
accuracy index, which might due to the unsatisfied accuracy. Since the in-diffusion 
experiments for strongly sorbing radionuclides were more complex than the through- 
diffusion experiment [3,14,15], limited investigations have been conducted to measure 
the apparent diffusion coefficient. This parameter represents the coupling effect of 
radionuclide sorption and diffusion. As a result, there is a lack of information regarding 
its relationship with the aforementioned influencing factors. 

Machine learning algorithms have been widely developed in regression prediction 
[20,24,25]. These algorithms utilized global analysis technologies, such as Shapley 
Additive Explanation, Individual Conditional Expectation, and Partial Dependence 
Plots analyses, to visually analyze the nonlinear relationship between ion diffusion 
coefficients and influencing factors [24,25]. Predictive accuracy is related to the dataset, 


including input features (influencing factors) and data size. Previous radionuclide 


diffusion experiments primarily focused on specific experimental conditions, such as 
radionuclide species, compacted dry density, ionic strength, pH, temperature, and 
bentonite origin [3,7—10,17,26,27]. However, most investigations only considered a 
limited number of influencing factors. The data size decreased as the number of input 
features increased, posing challenges for the application of machine learning. This 
study examined the influence of input features and data size on predicting the apparent 
diffusion coefficient of Se(IV) in compacted bentonite. It utilized the through-diffusion 
method and two machine learning algorithms, namely Light Gradient Boosting 
Machine (LightGBM) and Random Forest (RF), based on a comprehensive diffusion 
dataset. Furthermore, the diffusion mechanism was investigated by analyzing the 
weight of influencing factors and the relationship between the apparent diffusion 


coefficient and each influencing factor. 


2. Materials and Methods 

2.1 Materials 

Gaomiaozi (GMZ) bentonite, which was obtained from Beijing Research Institute 
of Uranium Geology, is originated from Gaomiaozi Mine in Xinghe Country (Inner 
Mongolia, China). It was converted to Ba-bentonite by mixing BaCl2 and GMZ powder 
in a mass ratio of 5%. The detailed preparation procedure of Ba-bentonite was derived 
from a previous study [28]. They were compacted into Ø 25.6 x 12.6 mm blocks with 
the compacted dry density of 1300-1700 kg/m’. 


The stock solution of Se(IV) was prepared using SeO2 (from Sinopharm Reagent) 
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powder dissolved in 0.1—0.5 mol/L NaCl solution. The initial concentration of Se(IV) 


in the diffusion experiment was (12.5 + 1.0) x 10"? mol/L for Ba-bentonite experiments 


and (17.9 + 0.8) x 107° mol/L for GMZ experiments, respectively. All the solutions used 
in the experiment were prepared with ultrapure water from a Milli-Q system (Millipore, 
USA). The concentration of Se was determined by an Optima 7000DV inductively 
coupled plasma optical emission spectrometry (PerkinElmer, USA). 

2.2 Diffusion experiments 


Through-diffusion experiments were conducted under ambient condition at room 


temperature (22 + 3°C). The bentonite blocks were completely saturated in 0.1—0.5 
mol/L NaCl solution for five weeks. One side of diffusion cells (x = 0) was connected 
to a source reservoir filled with 200 mL of Se(IV) solution. The other side of the 
diffusion cell (x = L) was connected to a target reservoir with 10 mL of NaCl solution. 
Se(IV) diffused through the 12.6 mm thickness of a bentonite block and reached the 
target reservoir, which was regularly exchanged with a new 10 mL of NaCl solution 
without Se(IV) to keep the concentration gradient constant. 

2.3 Diffusion data processing 

Two sets of experimental data can be obtained: the first one consists of the 
accumulated mass (Acum) of Se(IV) versus time, the second one consists of flux (/(Z,f)) 
versus time [3,22]. The first one involves using the analytical solution of Fick’s law to 
calculate the effective diffusion coefficient (De) and rock capacity factor (a), the 


equation is as follows [3]: 
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where S, L, and Co were the cross section of bentonite block, the thickness of the block, 
and the initial concentration of Se(IV) in the source reservoir. 
The second one was conducted to verify the effective diffusion coefficient and the 


rock capacity factor as follows: 
1 OA 
J(L,t) Sr re (3) 
2.4 Machine learning database description and analysis 
The training dataset was sourced from the diffusion database system in Japan 
Atomic Energy Agency diffusion (JAEA-DDB) (1982-2009) and collected from 
relevant literatures on radionuclide diffusion in bentonite (2010-2023) 
[8,9,21,22,28—32]. Insufficient descriptions of influencing factors in JAEA-DDB and 
literatures led to a reduction in the number of instances as the influencing factors 
increased, significantly affecting the predictive performance of ML models. This study 
examined the number of instances and input features. Specifically, the input features 
numbered three, five, and eight, corresponding to a total of 850, 820, and 739 instances 
in JAEA-DDB (J3, J5, and J8). Additionally, 106 instances were collected from 
published literatures, resulting in total instances of 956, 926, and 845 for M3, M5, and 
M8, respectively. The dataset IM8 contained 956 instances that imputed M8 using the 
missForest imputation method [33]. The statistical information related to the number of 
input features and instances was summarized in Table 1. The input features included 


the compacted dry density, ion diffusion coefficient in water, ionic strength, rock 
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capacity factor, distribution coefficient, montmorillonite content, ionic charge, and 
temperature. 

Table S1 in the supporting information provided a summary of the 
montmorillonite content for various types of bentonite. Since no literature 
montmorillonite content value of Kunipia-P was available, it was assumed to be 
equivalent to that of Na-montmorillonite [34]. The output variable was the apparent 
diffusion coefficient. Both ion diffusion coefficient in water and apparent diffusion 
coefficient were transformed into logarithmic form to enhance predictive accuracy. 

To assess the predictive performance of machine learning models, the root mean 
square error (RMSE) and coefficient of determination (R?) were utilized. These metrics 


can be calculated using the following equations: 


1 N x z 
DÈ LLD,” -LogD,,;"") , (4) 


izl 


RMSE = 


2 


a,i 


N 
> (LogD,;"" -LogD,,;"%) 
izl 


(5) 


N 2; 


> (LogD,,;"" —LogD, ©) 


= save 
with the number of samples (N), the logarithms of the experimental and predicted 
apparent diffusion coefficient ( LogD,,;“” and LogD,,"" ), and the average 


experimental apparent diffusion coefficient ( LogD 


a,ave 


"P4, Models demonstrating great 


accuracy and performance are characterized by low RMSE and high R?. 


151 Table 1 Descriptive statistics of the dataset for each model. 


Input Output 
Instance Pd LogD,, I a Ka m Z T LogD, 
Model 
number (kg/m?) ©) (mol/L) S (m*/kg) ©) ©) CC) ©) 
J3 850 1371 —8.78 0.24 — — — — — —10.49 
M3 956 1368 8.78 0.25 - — = = = —10.48 
J5 820 1366 —8.79 0.24 124 0.08 — = = -10.52 
Mean M5 926 1363 —8.79 0.25 115 0.07 - — = —10.50 
J8 739 1298 —8.78 0.18 137 0.09 0.85 0.32 29.31 —10.50 
M8 845 1304 8.78 0.19 126 0.08 0.85 0.21 28.51 —10.49 
IM8 956 1368 8.78 0.25 121 0.08 0.84 0.14 27.89 —10.48 
J3 850 446 0.20 0.69 - = — = - 0.93 
M3 956 439 0.19 0.66 — — — — — 0.91 
J5 820 446 0.21 0.70 1310 0.67 — = - 0.93 
Std M5 926 439 0.20 0.66 1235 0.63 — — — 0.90 
J8 739 389 0.21 0.43 1380 0.70 0.20 1.31 16.32 0.93 
M8 845 388 0.20 0.42 1292 0.66 0.20 1.29 15.54 0.90 
ener AEE nn AAEE, SEA SMEM. SMRTI MNE NE ee ANA take 
J3 850 400 —9.30 0 — — = = — -15.55 
M3 956 400 —9.30 0 - — = = = -15.55 
J5 820 400 —9.30 0 0.03 —3.8 x 10! — — = -15.55 
Min M5 926 400 —9.30 0 0.02 —3.8 x 10! — = = -15.55 
J8 739 400 —9.30 0 0.05 -3.8 x 10! 0 —2 5 —15.55 
M8 845 400 —9.30 0 0.02 —3.8 x 10? 0 —2 5 -15.55 
IM8 956 400 —9.30 0 0.01 -3.8 x 104 0 -2 5 —15.55 


13 850 2730 -8.24 5 = 5 ; s si -8.97 
M3 956 2730 -8.24 5 : 2 z z s -8.97 
J5 820 2730 -8.24 5 34877 17.40 že $ 2 -8.97 
ies M5 926 2730 -8.24 5 34877 17.40 = : 2 -8.97 
J8 739 2330 -8.24 5 34877 17.40 1 5 90 -8.97 
M8 845 2330 -8.24 5 34877 17.40 1 90 -8.97 
IM8 956 2730 -8.24 5 34877 17.4 1 90 -8.97 
ON RA 850 O! 059 O. me EAN Goo, SSI EN r s Sao 
M3 956 0.52 ~0.22 6.10 = š z s = -1.53 
J5 820 0.57 -0.17 5.92 23.41 22.01 z š 2 -1.55 
Skw M5 926 0.50 -0.18 6.02 24.79 22.89 : - z -1.55 
J8 739 0.22 -0.14 7.92 22.24 20.91 bej 047 132 -1.65 
M8 845 0.13 -0.15 7.48 23.69 21.88 -2.10 0.60 1.46 -1.64 
IM8 956 0.52 400 6.10 23.97 22.51 2.01 063 1.62 -1.53 


152 Std < Standard Deviation; Skw < Skewness 
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3. Results and discussion 
3.1. Measurement of the apparent diffusion coefficient by through-diffusion 


experiments 


The through-diffusion method has been extensively used in the study of the 
diffusion of anionic radionuclides, attributed to the high diffusivity resulting from the 
anionic exclusion effect [9,10,22]. Se is a crucial radionuclide in repository safety 
assessment. The primary form of Se(IV) was HSeO3° at pH ranging from 4 to 7 [35]. 
Fig. 1 illustrates the breakthrough curves of HSeOs in Ba-bentonite and GMZ 
bentonite. The accumulated mass of HSeO3" exhibited a linear increase during the 
steady-state stage, which is related to the effective diffusion coefficient. In contrast, the 
flux showed a significant increase during the transient-state stage, which is linked to 
the rock capacity factor. The time taken to reach the steady-state stage is related to the 
compacted dry density and increase from about 10 days at the compacted dry density 


of 1300 kg/m? to about 20 days at the compacted dry density of 1700 kg/m?. 
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Fig. 1 The accumulated mass (Acum) and flux (J(L,t)) of HSeO3 in compacted bentonite 


as a function of time. (A and B) Ba-bentonite, J = 0.5mol/L, pH = 3.1 + 0.1, T = 22 + 


3°C. (C and D) GMZ, pa = 1300 kg/m’, pH = 5.6 + 0.1, T= 22 + 3°C. 


Table 2 lists the diffusion parameters of HSeO3 in compacted Ba-bentonite and 
GMZ bentonite. The apparent diffusion coefficient of Ba-bentonite decreased from 6.8 
x 107! m?/s to 2.2 x 107! m?/s as the compacted dry density increased from 1300 kg/m? 
to 1700 kg/m?. In addition, the apparent diffusion coefficient of GMZ bentonite 


increased from 5.9 x 10-!! m?/s to 6.7 x 107!! m?/s as the ionic strength increased from 
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0.1 mol/L to 0.3 mol/L. It was slightly lower than that reported in synthetic pore water 
(Da < 7.8 x 10"! m?/s) in a previous study [36], which could attribute to the higher pH 
and more complexity of the ions in the synthetic pore water. 


Table 2 Summary of diffusion parameters in Ba-bentonite and GMZ bentonite. 


Bentonite ( ee r a ( soo ( a ( La 

oj ja PN REV O mike) m/s) 
05 1300 6840.6 1.004008 38403 68408 
0.5 1400 5.7+06 0964008 35404 59408 
Ba-bent oa 0.5 1500 44404 0.964008 35406  4.6£0.6 
0.5 1600  3.0£03 0924008 33402 33404 

e EIR VNEL. 05 1700 20x02 0904008 32404 22405 
1794 Ol 1300 5640.5  095£008 32403 59407 
GMZ 

0.8 03 1300 66+0.5 098-008 34403 67+0.8 


3.2. Prediction of machine learning algorithms 

Both LightGBM and RF are popular and powerful tree-based machine learning 
algorithms employed in regression analysis. They have been applied in various 
diffusion studies, such as predicting the effective diffusion coefficient of Re(VII) in 
compacted bentonite [20] and chloride diffusion coefficient in cements [25]. In this 
study, they were employed to predict the apparent diffusion coefficient of HSeOs in 
compacted bentonite. Hyperparameters refer to a set of values that are specified before 
training the machine learning. Optimal performance can be achieved through successful 
hyperparameter optimization. Table 3 lists the optimized hyperparameters for 
LightGBM and RF algorithms. 
Table 3 The optimal parameters for Light Gradient Boosting Machine and Random 


Forest algorithms. 
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Algorithms Parameters J3 J5 J8 M3 M5 M8 IM8 


Num leaves 30 30 30 31 31 30 30 

Min data in leaf 17 11 27 53 29 20 3 

. Max_depth 3 6 3 -1 11 -1 5 

LightGBM . 
Learning_rate 0.30 0.30 0.25 0.03 0.01 0.10 0.26 
Num boost round 10000 10000 10000 10000 10000 10000 10000 
Feature_fraction 0.60 0.48 0.37 1 1 1 0.39 
~~ Max_depth None 8 6 None None  I5 7 O 

RE Min samples split 2 3 3 2 2 3 12 
Max features Auto Auto Auto Auto Auto Auto 0.42 

N_ estimators 10 10 3 14 10 12 3 


The predictive results of the apparent diffusion coefficient using LightGBM and 
RF for different training sets are shown in Fig. 2 (J3, J5, J8, M3, M5, M8, and IM8). 
Figs. 2A—2G show the prediction results of LightGBM and Figs. 2I-20 show the 
prediction results of RF. Figs. 2H and 2P represent the application of IM8 in predicting 
the Da values of ReO4-, HCrOz-, I-, CEEDTA-, HTO, and Cs' in compacted bentonite 
[37—40]. These species are selected to surrogate the monovalent radionuclide anionic 
species, neutral molecules, and monovalent cations. Notably, the M-model exhibited 
lower RMSE and higher R* compared to the J-model, indicating that a larger number 
of instances enhanced predictive performance. The RMSE ranked in the following 
order: J3 > J5 > J8 and M3 > M5 > M8, while the R? ranked in the opposite order of 
RMSE: J3 < J5 < J8 and M3 < M5 < M8. The predictive accuracy of IM8 was similar 
to M8 for the LightGBM algorithm, while IM8 exhibited higher predictive accuracy 
than M8 for the RF algorithm. These observations indicate that an increased number of 
input features enhanced predictive performance [41]. 


For J3 and M3, both LightGBM and RF produced relatively low prediction 
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accuracy with R? below 0.8, indicating that three input features were insufficient for 
accurate predictions of the apparent diffusion coefficient for HSeO3” in compacted 
bentonite. However, when the number of input features increased to five, the predictive 
accuracy of M5 using LightGBM significantly increased with R? of 0.939 and RMSE 
of 0.042. The IM8 demonstrated superior predictive accuracy for both LightGBM (R? 
= 0.98 and RMSE = 0.025) and RF (R? = 0.95 and RMSE = 0.039). The application of 
IM8 in predicting the Da values of radionuclides also exhibited a good prediction 
performance, with R? of 0.79 and RMSE of 0.34 for LightGBM and R? of 0.63 and 
RMSE of 0.45 for RF (Figs. 2H and 2P). It indicates that LightGBM has better 
generalization ability than RF. LightGBM utilizes gradient-based one-sided sampling, 
exclusive feature bundling, and histogram-based algorithm techniques, effectively 
preventing overfitting, accelerating training speed, and reducing memory consumption, 


thereby leading to improved predictive accuracy. 
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Fig. 2 The apparent diffusion coefficient of HSeO3 in compacted bentonite was 
predicted using Light Gradient Boosting Machine and Random Forest for various 
numbers of input features and instances. The experimental Da values of ReO«, HCrO«, 
T, CeEDTA-, HTO, and Cs* in Figs. 2H and 2P were from [37—40]. 

3.3. Spearman and Shapley Additive Explanation analyses 

Fig. 3 shows the correlation of each feature with Da in the training dataset using 
Spearman analysis and the weight of input features for IM8 using Shapley Additive 
Explanation (SHAP) analysis. Models of J3—J8 and M3—IM8 show similar Spearman's 
correlation coefficients, indicating that the number of input features and instances had 
insignificant on the non-linear relationship (Fig. 3A). Four parameters (compacted dry 


density, rock capacity factor, distribution coefficient, and ionic charge) had negatively 
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correlated with the apparent diffusion coefficient, three parameters (ion diffusion 
coefficient in water, montmorillonite content, and temperature) exhibited positive 
correlation, and ionic strength had insignificant impact. The rock capacity factor, 
distribution coefficient, compacted dry density, and ionic charge had high absolute 
values of Spearman’s correlation coefficients, indicating their high correlations with 
the apparent diffusion coefficient. 
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Fig. 3 (A) The correlation analysis of the training dataset and (B) the contribution of 
input features to the predicted apparent diffusion coefficient. 

Shapley Additive Explanations (SHAP) method is commonly utilized for 
analyzing the weight or importance of input features on prediction. The results of SHAP 
in LightGBM and RF demonstrated an approximately similar ranking order (Fig. 3B). 


Specifically, LightGBM ranked the ion diffusion coefficient in water the fifth, whereas 
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RF ranked it the fourth. The top two input features for predicting apparent diffusion 
coefficient were the rock capacity factor and compacted dry density, contributing to a 
total of 53.3% for LightGBM and 53.7% for RF compared to all features in IM8. 

The montmorillonite content and ionic strength are two important parameters that 
has been extensively investigated. Since they have a significant influence on the 
effective diffusion coefficient of anionic radionuclides [8,22,42]. SHAP analysis 
demonstrates that the montmorillonite content and ionic strength made an insignificant 
contribution to the prediction (Fig. 3B). The tendency of ionic strength is consistent 
with [9], who reported that the insignificant influence on the apparent diffusion 
coefficient of Cs* and Na" was attributed to the coupled effects of diffusion and sorption. 
Further studies are needed to verify these results, as the weight of input features is 
influenced by different types and numbers of instances, and machine learning 


algorithms. 


3.4. Partial Dependence Plots analysis 

Partial Dependence Plots (PDP) can visualize the relationship between each input 
feature and the predicted apparent diffusion coefficient. Fig. 4 shows the contribution 
of each input feature to the prediction for IM8 using PDP analysis. The gray column 
shows the distribution of data points of an input feature. The PDP value of LightGBM 
is represented by solid lines, while the dashed line represents the PDP value of RF. A 
flat curve suggests that the feature has an insignificant influence on the predicted 


outcome. In contrast, a steep curve indicates a stronger relationship. Both LightGBM 
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and RF demonstrate similar trends for the dependency of predicted apparent diffusion 


coefficient on each input feature. 
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Fig. 4 Partial Dependence Plots (PDP) analysis based on the Light Gradient Boosting 
Machine (LightGBM) and Random Forest (RF) applied to the prediction of the apparent 
diffusion coefficient: (A) compacted dry density, (B) temperature, (C) montmorillonite 
content, (D) ionic strength, (E) rock capacity factor, (F) distribution coefficient, (G) ion 


diffusion coefficient in water, and (H) ionic charge. 


The analysis of the relationship between the predicted apparent diffusion 
coefficient and the input features related to experimental conditions was consistent with 
published experimental results [3,13,26]. Specifically, it showed a negative correlation 
with the compacted dry density (Fig. 4A). It can be explained that the available pores 
for radionuclide diffusion decreased with increasing compaction. At the compacted dry 
density above 1600 kg/m", the decreasing tendency changed slowly, attributed to the 
reduction of interlayer pores to one water layer [21,42,43]. 
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The apparent diffusion coefficient exhibited a positive correlation with 
temperature (Fig. 4B). In a repository, the temperature of clay closed to the waste 
containers could exceed 100°C [44]. The positive relationship between the effective 
diffusion coefficient and temperature was reported for HTO, *°Cl-, and ReOs-, which 
followed Arrhenius equation [3,7,26]. Figs. 4C and 4D show that the apparent diffusion 
coefficient exhibited insignificant correlations with montmorillonite content and ionic 
strength. Slight negative and positive impacts were observed for montmorillonite 
content ranged from 0.5 to 0.8 and ionic strength below 1.0, respectively, which agree 
with their correlations with the effective diffusion coefficient [20]. Plenty of studies 
have shown that as the ionic strength increases, the effective diffusion coefficient of the 
radionuclide also increases [8,22]. This tendency was explained by the fact that the 
thickness of electrical double layer decreased in high salinity solution [9,22]. In general, 
the consistency between PDP and experimental results implies that PDP analysis can 
provide insights into the diffusion law and diffusion mechanism of radionuclides. 

The input features, namely the rock capacity factor, distribution coefficient, ion 
diffusion coefficient in water, and ionic charge, are the parameters related to the 
properties of radionuclides. Their dependencies on the apparent diffusion coefficient 
were unclear due to the coupled effect of solid, liquid, and radionuclides. Fig. 4E 
indicates that the rock capacity factor had a negative impact on the prediction. 
According to Eq. (3), the apparent diffusion coefficient is inversely proportional to the 
rock capacity factor. The negative relationship between apparent diffusion coefficient 


and distribution coefficient can be attributed to the linear correlation of the rock 
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capacity factor and distribution coefficient (Fig. 4F). The predicted apparent diffusion 
coefficient increased with increasing ion diffusion coefficient in water (Fig. 4G), 
indicating that radionuclide diffuse guickly in both liguid and solid. RF algorithm 
shows that the ionic charge had insignificant on apparent diffusion coefficient, whereas 
LightGBM algorithm demonstrates that it decreased as the ionic charge reduced from 


0 to +3 (Fig. 4H). 


4. Conclusions 


The effect of input features and instances on the prediction of apparent diffusion 
coefficient was conducted using Light Gradient Boosting Machine (LightGBM) and 
Random Forest (RF) algorithms. HSeO3" (as a surrogate to ”HSeO3-) diffusion 
experiment in compacted bentonite was conducted using a through-diffusion method to 
testify the predictive performance. Increasing the number of input features resulted in 
a decrease of instances. LightGBM (R? = 0.98 and RMSE = 0.025) and RF (R? = 0.95 
and RMSE = 0.039) exhibited superior predictive accuracy for the training set of 956 
instances and eight input features, which were the compacted dry density, ion diffusion 
coefficient in water, ionic strength, rock capacity factor, distribution coefficient, 


montmorillonite content, ionic charge, and temperature. 


Shapley Additive Explanations exhibited that the top two input features for 
predicting apparent diffusion coefficient were the rock capacity factor and compacted 


dry density. Partial Dependence Plots indicated that the dependency of apparent 
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diffusion coefficient on the rock capacity factor, compacted dry density, and distribution 
coefficient was negative, whereas the positive relationship between the apparent 
diffusion coefficient and the ion diffusion coefficient in water was observed. This study 
presented a method for predicting the apparent diffusion coefficient, quantifying the 
influencing factors, and understanding the effect of each input features on the apparent 
diffusion coefficient. The insightful information on the diffusion mechanism is 


beneficial for the safety assessment of repositories. 
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